[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

sahilkumarsingh · 2025-12-08T20:56:19Z

What changes were proposed in this pull request?

This PR will address the issue SPARK-54634.

With this, I am adding a user-friendly error message when users write SQL queries with an empty IN clause, like: SELECT * FROM table WHERE col IN ()

Why are the changes needed?

When users write SQL with an empty IN clause, Spark currently produces a syntax error of subclass [PARSE_SYNTAX_ERROR], which leads the user to believe that their syntax is incorrect, whereas the actual issue is due to the absence of values for the IN clause. Hence, the current error message does not communicate the right intention to the user.

This change provides a clear, actionable error message that explains the actual problem
and suggests alternatives.

Example - Before:

org.apache.spark.sql.catalyst.parser.ParseException:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'. SQLSTATE: 42601 (line 1, pos 33)

Example - After:

org.apache.spark.sql.catalyst.parser.ParseException:
[INVALID_SQL_SYNTAX.EMPTY_IN_PREDICATE] Invalid SQL syntax: IN predicate requires at least one value. Empty IN clauses like 'IN ()' are not allowed. Consider using 'WHERE FALSE' if you need an always-false condition, or provide at least one value in the IN list. SQLSTATE: 42000

Does this PR introduce any user-facing change?

Yes, users will now see a better error message.

Code executed: spark.sql("SELECT * FROM range(10) WHERE id IN ()").show()

Before output:

After output:

How was this patch tested?

I have added unit tests in QueryParsingErrorsSuite.scala and SQL golden tests added in predicate-functions.sql
I have also tested the build locally by running the query in spark-shell

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic) - used for code assistance, test generation, and documentation.

allisonwang-db

Thanks for making the error message better!

allisonwang-db · 2025-12-16T19:04:35Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala

      exception = parseException(sql2),
      condition = "PARSE_SYNTAX_ERROR",
-      parameters = Map("error" -> "'IN'", "hint" -> ""))
+      parameters = Map("error" -> "'INTO'", "hint" -> ""))


What's the error message before and after this change for this test case?

Hey Allison,

This is the before and after this change for this test case:

Before:

[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show() org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'. SQLSTATE: 42601 (line 1, pos 25) == SQL == SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2)) -------------------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:285) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:97) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(AbstractSqlParser.scala:93) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$5(SparkSession.scala:492) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:491) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:490) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:504) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:513) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:91) ... 42 elided

After:

[scala> spark.sql("SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2))").show() org.apache.spark.sql.catalyst.parser.ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'INTO'. SQLSTATE: 42601 (line 1, pos 36) == SQL == SELECT * FROM S WHERE C1 IN (INSERT INTO T VALUES (2)) ------------------------------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:267) at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:78) at org.apache.spark.sql.execution.SparkSqlParser.super$parse(SparkSqlParser.scala:163) at org.apache.spark.sql.execution.SparkSqlParser.$anonfun$parseInternal$1(SparkSqlParser.scala:163) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:107) at org.apache.spark.sql.execution.SparkSqlParser.parseInternal(SparkSqlParser.scala:163) at org.apache.spark.sql.execution.SparkSqlParser.parseWithParameters(SparkSqlParser.scala:70) at org.apache.spark.sql.execution.SparkSqlParser.parsePlanWithParameters(SparkSqlParser.scala:84) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$6(SparkSession.scala:573) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:148) at org.apache.spark.sql.classic.SparkSession.$anonfun$sql$4(SparkSession.scala:572) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:563) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:591) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:682) at org.apache.spark.sql.classic.SparkSession.sql(SparkSession.scala:92) ... 42 elided

Hey @allisonwang-db , could you check this output and let me know, thanks!

allisonwang-db

Thanks for the fix. Much better error message.

sahilkumarsingh · 2025-12-28T11:19:42Z

Thanks for approving the changes, @allisonwang-db. Do you happen to know when this PR might be merged?

allisonwang-db · 2026-01-07T01:20:42Z

cc @cloud-fan

cloud-fan · 2026-01-07T03:55:08Z

sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala

+      errorClass = "INVALID_SQL_SYNTAX.EMPTY_IN_PREDICATE",
+      messageParameters = Map(
+        "alternative" -> ("Consider using 'WHERE FALSE' if you need an always-false condition, " +
+          "or provide at least one value in the IN list.")),


why pass the alternative as an error parameter, instead of just put it in the error message template?

Looking back, it's quite possible to directly include this alternative in the error message template. Shall I make this change?

Done, please check.

…DICATE

cloud-fan · 2026-01-08T07:49:14Z

thanks, merging to master!

[SPARK-54634][SQL] Add clear error message for empty IN predicate

17d4b37

github-actions bot added the SQL label Dec 8, 2025

sahilkumarsingh added 2 commits December 9, 2025 10:58

Updated the PlanParserSuite test

b25783f

Updated the PlanParserSuite test again and predicate-functions.sql.out

1caa5c6

allisonwang-db reviewed Dec 16, 2025

View reviewed changes

allisonwang-db approved these changes Dec 26, 2025

View reviewed changes

cloud-fan reviewed Jan 7, 2026

View reviewed changes

sahilkumarsingh added 2 commits January 7, 2026 11:20

Moved static alternative message into error template for EMPTY_IN_PRE…

415fb9c

…DICATE

Updated the golden files

3a6ec29

cloud-fan approved these changes Jan 7, 2026

View reviewed changes

cloud-fan closed this in 9c67509 Jan 8, 2026

[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

[SPARK-54634][SQL] Add clear error message for empty IN predicate #53390

Uh oh!

Conversation

sahilkumarsingh commented Dec 8, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

allisonwang-db left a comment

Choose a reason for hiding this comment

Uh oh!

allisonwang-db Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

allisonwang-db left a comment

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh commented Dec 28, 2025

Uh oh!

allisonwang-db commented Jan 7, 2026

Uh oh!

cloud-fan Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

sahilkumarsingh Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants